Skip to main content

Mastering Databases with Postgres

in this video we are going to talk about databases when you're dealing with backend systems interacting and handling databases is one of the most important and one of the most frequent operation that you're going to perform so understanding all the concepts surrounding it is crucial to be efficient in your job so let's start with the question why why do we need databases in the first place now at its core a database is simp simply a way to persist information across different

different sessions to talk in a very high level and what that means is and persistence basically means storing data in a way so that it survives even after the program that created it has been stopped so for example think of your to-do list app right you add an entry and you can check off all the task that you have already finished etc etc like that's the basic functionality and expectation from a to-do app now when

you do some operation when you create an item or you when check off an item and you close the app and when you open it again you find all the information the state of the data that you left it it's the same when you visit again when you come back to the app so that's what we mean when you say persistence the information has to be there in the same expected State even after a considerable amount of time has passed or across different

different physical locations right that's what we mean by persistence if you did not have persistence every time you open that app You' have to create new to-do list items and You' lose all your progress whatever task you had created previously etc etc right persistence is a very important thing and we use it in our day-to-day lives a lot now what is a database when what do we mean when we say the word database the term

database is surprisingly very very Broad in the simplest sense any kind of structured any kind of structured storage can be considered as a database for example you have a smartphone and in your phone you have a contact list right of different different people and that list can be considered as a database in the same way if you are a developer you must be familiar with the concept of local storage uh which is offered by

every browser so we can check that out right now if we open this the developer tools and when we go application and we can check the local storage and it has some entries right since we are using this website called xcal draw the website has stored some entries in the format of key Value Store it also has a session storage also has a cookie storage etc etc right all these different different types of storage mechanisms they are also considered as a

database and even if you take a simple text file where you just jot down notes and you can refer to those notes later on even that can be considered as a very very basic database so if you try to derive the patterns from all these examples database basically means is some kind of system some kind of persistent system which offers ways to create read update and delete data from

it right that's a very basic and high level example of database which is not limited to what we mean when we say database in the context of backend systems database is a very generic term which means a persistence layer which provides operations of crud create read delete and update but when we say database in the context of backend systems in the context of servers right in typical developer context when we

mention database there what we mean is disk based databases disk based right and by dis I mean hard disk it can be HDD ssds or whatever modern technology of storage we have available but why dis storage why dis based databases because disk storage whether it's a traditional hard drive or a modern solid state drive or also known as SSD is relatively cheap when compared to other means of storage

for example when we consider Ram which is also considered as a main memory or the primary memory when we take into account our CPU unit right if you're from a computer science background you must have known that we have a CPU unit which can access the primary memory and it can also access the secondary memory which we also call as dis based memory right which uses either a hard drive or a SSD now Ram is very fast it's called

main memory and it is very very fast as compared to the dis based alternative but the problem is Ram based storage it's relatively costly and we don't have as much Supply as you do when it comes to dis based so if you check the configuration of your system whether it is a Windows Bas Linux Mac does not matter if you check the configuration of your system most people have something like 8 GB Ram or 16 GB Ram or 32 GB Ram

or if you are on the higher end of the consumer you must be having 64 or Max to Max 128 GB of RAM right but when it comes to hard disk when it comes to a second disc storage most people since hard discs are relatively cheap these days most people they have storage starting from 5 12 GB up to 2 TB right that's the average storage people have in their laptops in their systems right and you can see the pattern we when it

comes to the storage space we have limited amount of ram we have limited amount of primary memory but we have a lot of space when it comes to displaced memory or secondary memory and trade off is RAM is very fast when it comes to data retrieval or data saving and disk based storage is relatively slow because of the way the data is stored and because of the way the data is fetched and that's the reason when we use

caching mechanisms either in redis or in memory caches etc etc whenever we talk about caching we talk about storage in Ram or storage in primary memory because fetching data or saving data from primary memory from our cache is very very fast when it comes to fetching data from our databases which are disk based storages also known as secondary storage but when it comes to databases the thing that matters most is we need

more space and we can do a fair amount of tradeoff when it comes to speed and that's the reason most databases at least when it comes to traditional databases like relational or non- relational databases they are based on disk based storage or secondary storage that's where they actually store the data and the format they store the data the way they store their data all the algorithm surrounding it etc etc that's a very technically deep area and we are

not going to cover that in this video we just want to keep this relevant for a backend engineer right on the application Level the only thing you need to know is caching Technologies like redis Etc they store their data in primary memory or RAM and databases traditional databases like POG or mongod Debi which we'll cover next all these relational or non- relational databases they store their data in disk or secondary memory because

disk based storages offer more capacity at less price but at the cost of less speed that it is clear we are talking about databases in the context of disk based storage and that too in the context of backend systems right now we come to a term which is known as dbms also known as database management system now what is a database management system just storing our data whatever data we

have just storing it in some kind of dis based storage that is not enough of course we also need different different ways of retrieving that data or making changes to that data or deleting that data right all these different different operations that has to be done and that has to be done in a very efficient way since we are talking about hundreds and thousands of GBS of data right so we need these operations create read update and delete operations and that to in a very efficient manner and that's the

reason we have softwares we have software systems called as dbms softwares and whose sole responsibility is to efficiently provide all these cud operations to this to the clients or to the users whoever is using it and storing that data of course they also have a lot of other responsibilities like security and scaling the database systems and load balancing etc etc right but on a very high level they have two responsibilities storing the data and

providing different different operations to the client or the user primarily CED operations create read update and delete so in a way some of the responsibilities of dbms that we can point out is First Data organization they need to efficiently organize the data so that fetching updating and creating more data etc etc all these operations are efficient second access as I've already mentioned they have to provide methods to do c operations create read update

and delete third Integrity now it's a technical term but what it basically means is the accuracy of the data the validity of the data we'll see what all these different different buzzwords mean when we actually go into examples so this just means the data is correct right whatever data that we are storing that data is valid that data is not corrupt so to give you a very high level

example it's an e-commerce platform and we want to store all the order details of customers and each order will have some kind of payment information how much was the order etc etc right and in the database we are storing the payment amount as a number so it's the responsibility of our database management system or our dbms software to maintain the Integrity of the data so that no one can insert anything in that

field if we have a payment field then for this field we want only number data Ty right we want some kind of numerical amount so someone if someone tries to store a string something like let's say something and if they try to insert it here that should fail the dbms software should not allow it because it's their responsibility to maintain the Integrity of the data the correctness of the data the validity of the data right it has a

lot of meanings in a lot of contexts but this is what it means the data has to be accurate the data has to be consistent now fourth is security which basically means protecting the data from unauthorized access databases dbm softwares have different different users different different roles etc etc for the sake of protecting the access to data right so that's a very high level overview of what are the different responsibilities of database management system now here we have a question why do we need these dbms softwares why not

use just simple text files or any kind of file and store all our data in text format inside it and when people were coming up with these concepts of databases or before people came up with these concepts of database management system this is how people started with they try to store all the data in text files right now the problem with storing your data in text files is there are a couple of problems first thing parsing so if we are going to store our data in a text file every time you wanted to

find a specific data point let's say it's a customer database and you are storing all the details of your customer in your database inside that text file as plain text and every time you want to find a specific customer you'd have to write code application code it it can be any programming language python Java JavaScript go rust it can be any programming language you have to write code in that programming language in your application Level in your backend code to pass the text file split the

lines so if we have a text file and we have different different lines you have to pass all the entries of this text file and we have to split these lines and we have to compare each field and this whole process is very slow if you're familiar with passing data from our file system in different different programming languages languages like rust are relatively faster when it comes to passing kind of uh operations but languages like JavaScript python are relatively slower as compared to rust

right they are fast but compared to a highly performing language like rust they are relatively slow so if you're programming language is something like JavaScript or Ruby then you'll also face a huge performance hit right and also doing all this passsing thing is very error prone if something goes wrong you can corrupt your data you can provide wrong information to your customers etc etc like parsing itself is not an efficient solution because how slow the operation

is and how error prone it is the second thing is there is no structure text files do not have a formal structure you cannot enforce that data has to be in this particular structure in this file right text files are very fluid you can store any amount of text in any format you just have to dump it there right there is no structure and that makes it very difficult to enforce consistency in

data you cannot enforce something like this particular field will only have number you cannot enforce that rule right since it is all text file since it is all string it will take any kind of data and it will put it there right it cannot promise any kind of data consistency third thing concurrency by concurrency what we mean what happens if two people at the same time try to update the same taex value right whose

update is going to be considered as legit and whose update is going to be going to get discarded obviously the last one who updated this text file only their update is going to persist because let's say we have a text file and we have an entry let's say amount right amount and we have something like 40 two people are trying to access it for the sake of modification they want to modify this field called amount so the first

thing they will do obviously is read the file right they need to read the file before they make any changes they cannot just put some arbitary amount they want to increase the amount so one person wants to increase the amount by 20 another person wants to to decrease the amount by 20 now what happens both of them read the data right both of them read the data at the same time they start their operation at the same time okay they both have the initial entry of 40 so the first person they increase the amount so they do 60 and the second

person they want to decrease the amount so they do it 20 and they save it now even though both of them started at the same time because it was a read operation when they save it of course there is going to be some kind of first comes first Ser kind of situation you cannot tell beforehand that this update is going to persist or this update is going to persist it all depends on what CPU cycle it is running and and thousand different parameters right so depending

on the environment of the system at the end of this operation at the end of both of these operations the value it might be 60 or it might be 20 right there is no consistency here that's what you mean by concurrency when two different users try to do some kind of modification at the same time to the same data you need some kind of concurrency mechanisms inside that software inside that database software to efficiently to

accurately manage the whole interactions to provide a consistent result right and a simple text file simply cannot do that so these are overall all the the challenges that people faced and if you try to do that today you'll also face the same challenges after some time after your data Grows Right and because of all these challenges because of all these limitations of Simply storing your database in a text file people came up

with softwares like dbms and now that we are talking about dbms we have finally reached the stage where we can talk about different types of dbms software on a very high level we have two types the major two types one is relational and the other is the opposite non- relational relational database basically means a database system which organizes data in tables rows and columns okay and

relationships between different different tables are defined using Concepts like foreign key Etc right so of the key features of relational databases data is structured and data is inserted into a database which have predefined predefined schema you cannot just arbitrarily insert or push any kind of data into your database that particular data should have a particular schema which basically means it should have a

corresponding table and that table will have a very strict schema you have to beforehand Define All The Columns of a table and all the data types of those columns right everything has to be pred decided you cannot do anything on the flat it is a very strict system and because of that strict schema enforcing the advantage it offers is data Integrity which basically means at any

point of time you can bet on the state of your data you know what is the data type of a particular column is what are the different relationships between different different tables and a lot of different things the data in your tables always have a consistent State they'll always have an accurate State similarly we have non- relational of and next in order to interact with this database we generally use the language SQL also

known as structured query language in the same way we have non- relational databases so some examples of relational databases which you might have already heard MySQL postgress SQL Server etc etc right we have a lot of different different relational database types similarly we have non- relational you must have heard about mongod one of the most famous databases in the non- relational domain and the difference between these is while relational databases force you to have a consistent

schema a very expected schema before you can put data inside your database right you have to have a predefined schema but non- relational databases do not enforce anything such as that right you can put any kind of data and two entries of a same table we don't use the term table in the non-relational domain but if you're doing a side by-side comparison then a table is called a collection in mongod right and inside a

collection each entry in relational it is called rows and in non- relational or M it is called a document in relational domain each row will have the same kind of data the structure of the rows will always be the same right but in non- relational domain each document can follow different data structure okay that's the primary advantage but sometimes disadvantage when it comes to the no SQL domain the advantage is obviously it has a very flexible schema

so if you're doing some kind of prototype you want to move fast you don't want to spend time figuring out your database schema enforcing the schema maintaining it etc etc then you can go with something like mongod and move very fast without thinking about your database schema you can you can push data on the Fly you can fetch any kind of data ET Etc right the flexibility can be an advantage in some scenarios so if you want to take an example let's say we have a CRM CRM also known as customer relationship management software and inside this what

usually happens is it needs to maintain accurate and consistent data about customers data like contacts and all the sales opportunity details etc etc right and this data this critical data is more suited to be inside a relational system a relational database like postgress is a very good choice because it provides strong data integrity and allows for complex queries and analyze relationships between customers so a CRM system is a very good candidate when you are trying to decide between a relational database and a non-

relational database in the same way if we want to find a use case for a non- relational database we can take an example like CMS also known as content management system so content Management systems are used to push content from a remote site to different different content distribution system for example let's say you have a blogging platform so every time you want to add a new article to your website you don't want to open your code base and write a new

file and push it to GitHub and deploy it Etc ET right you can pull that data in your website dynamically from a Content management system it can be something like sanity Etc ET like we have a lot of cmss in the market so you can pull that data from a Content management system and you can render that data so every time you want to add a new article you just have to log into your CMS system and write that blog or article in the markdown format and save it and the next time someone refreshes your website the

new article will be fetched so when you have a use case like this the content that scms needs to store that is not really structured so an article can have an image it can have a code blog it can have a YouTube embedding Etc ET like the content can be of a lot of types so a non- relational database something like the mongod makes a lot of sense when you don't know beforehand what are all the

different types of data that are going to be stored here you just want to take everything and dump it here right so in these use cases mongod makes a lot of sense but even though databases like mongod they offer a lot of flexibility they can also present challenges in terms of data Integrity which is a very important concept when it comes to database systems and because they often lack the strong constraints and relationships of relational

databases is EAS easier to introduce inconsistencies into the data since the schema of the data the Integrity of the data is not enforced at the database level you have to do it at the application Level which adds more complexity to your code and of course it is more error prone since application code is generally changed a lot and it's very easy to miss something right to introduce new bugs to your code base so now we have to make a choice right we

are going to learn about databases and there are a lot of options in the market we have relational we have non- relational in inside relational we have MySQL we have postgress we have site and SQL Server a lot of different different products and all of them pretty much offer the same kind of features and all of them can scale when it comes to implementing in in your production systems so we have to make a choice here what database are we going to proceed with and when you have a situation like this going with postgress makes a lot of

sense why a couple of reasons one it is open source and free right it is not a proprietary software it is completely open source you can go and look at the source code of the postgress even though you won't necessarily be doing that but still it is an open- Source software and a lot of companies prefer open source softwares so that they can host it they can deploy it in their own premises in their own servers etc etc right second the postgress dbms system it sticks to

the SQL standard so you can take any SQL query and you can run it on a postra system and it will perform the same way it performs in a different database system like MySQL or SQL Server because the postgress database system sticks to the SQL standard so in the future if you want to migrate your database to a different system then you won't have to do a lot of work let's say you want to migrate to myql for some reason right so

since postgress is SQL compli and you have written all your migration statements all your create update statements and fet statements in standard SQL format then you can just change your database and do some minor changes and you can easily switch from postgress to myql right that's one of the other reasons to go with postgress it is very easy to migrate to a different system if you need it three it is very extensible which means it offers a lot of features if you go to the

postgress documentation it is around, 1400 pages long and it offers a lot of features pretty much covers every single use case a typical SAS might encounter right and it also has a very good extension based system so you can customize it depending on your own needs fourth it is known for its reliability and scalability and fifth I think this is the one thing that makes it easy to

make the decision it has very good Json support now as we discussed just before this that one of the primary reasons to go with something like mongod a non- relational database is you can take any kind of data so when we say any kind of data we mostly mean Json any kind of Json because in Json you can Prett much store any kind of data right it can be number it can be string other Json that are embedded inside it arrays etc etc right so you can store any kind of Json

as a document in a no SQL database in a mongod database in the same way since postgress offers a Json data type and it has very good indexing and query capabilities for Json Fields there is no other reason to go for a different database just for dynamic data right you can use postgress with Json fields for your needs of dynamic data as we saw in the previous examples of a Content management system if you have some kind

of content that you're getting from the user and you want to save it in your database and you don't have a strict schema for that content you can go with a typical Json schema and dump whatever content that is coming from the user there and you when you want to render it you can fetch it from there and you render it right there is no need to switch to a non- relational database just for the needs of your Dynamic data right and because of all these features postgress is pretty much the number one choice when it comes to at least what I

have seen a lot of startups and a lot of big companies also stick with postgress and postgress is usually their first choice and even though if you do your own research you'll find a lot of Articles where you'll see my SQL has a lot of performance benefits etc etc right but until you are serving millions of users and you want to optimize a very specific bottleneck of your application you don't really need to think about whether you should go with mycle or whether you should go with posg right since the rich set of features and the

very good Json support of postgress postgress should be your first choice in pretty much all your projects and that's the reason in this video we are going to go with postgress whatever all we are going to learn about databases we will learn in the context of hress and with that I want to give a note here that there are already thousands of free resources available on YouTube elsewhere

like Udi you have paid resources and you have hundreds of free resources that cover the basics of SQL and postgressql right SQL is when we say SQL it is the language that you use to query and postgress is the database system the software on which we execute our SQL queries right so there are a lot of courses pre courses available right in YouTube for both SQL and postgress right

so instead of repeating the same thing instead of doing the all the create table select the basics of Order byy Group buy right all the basics of SQL and the basics of postgress again in this video we are going to save some time and we are only going to stick with some Concepts that are very relevant when it comes to backend systems and we are going to leave the basics so that you can learn it elsewhere so if you want you can pause this video right now and just search in YouTube SQL Basics

you will get videos from 1 hour to 20 hours and same for postgress Basics so you can choose a video according to your own Comfort what amount of comprehensiveness that you need you can watch a video of 1 hour and come back to this video or you can spend a couple of days learning about SQL Basics and postgress Basics and come back to this video right so that is one of the prerequisite before you proceed to the next section so we're going to skip all the basics so that you can do your own research this is table

plus it is a graphical software for interacting with your databases and it has a very modern UI fill and this is the software that I use on my day-to-day basis for all my database query operations and exploring Etc ET right so we'll be using this in this entire video to go through different different demos and to explore different different concepts now starting with what are all the different different data types that we have available in

postgress I just want to bring this even though just before this I mentioned that we won't be covering the basics of postgress and basics of SQL this video but I feel this is an important piece of information that we have to cover before we proceed so that we don't have to repeat this again when we are actually talking about our backend system right now we won't be going very deep into each data type we'll just do a very high level introduction this is a create table query if you're already familiar with the SQL language then you'll

recognize this this is how we create a table in any relational database whether it is POG gra myc etc etc right now we are creating a table whose name is data types demo and these are all the fields which basically means these are all the columns that are going to be in this table for each row right so starting with we have serial what does serial mean it is just an integer data type but each time we add a new entry into the

table this will increment the value so by default let's say it starts with zero right the next time you insert a row and if you omit this ID field in your insert statement value of this column for that row is going to be one same way the next time you enter something it'll be 2 3 4 right and usually we use serial or big serial so there is another version which basically has more capacity the maximum number that you can store in big

serial is a lot higher when it comes to serial so usually when you're dealing with production systems we go with big serial when we want to make ID as our primary key so primary key is basically a unique field using which we can identify a particular row in a table right now serial is an integer which is going to be automatically incrementing with each entry in our table then we have small int we have big int and we

have integer all these three are basically integers just that the capacity as I said the maximum number that you can store an integer so this is how the scale works we have integer Which is less than small end Which is less than big end so depending on your need you can either go with integer small end or big end same way we have decimal and numeric which are mostly considered identical you can use either of them they pretty much much behave the same way right and the purpose of this

is what does this 10 and this two means is it basically says you can store a number you can store a number in this format right on the right side of the decimal point there will always be two numbers right that's what this two means on the right side of the decimal point there will always be two numbers and what do 10 means is across all these numbers in this complete number across all these numbers in this numerical

repes representation the maximum amount you can store is 10 so you can store something like 1 2 3 4 5 6 7 8 dot 9 0 right now if you count it we have 10 numbers we cannot store something like three here we cannot add something like this because the maximum we can store is 10 numbers across this whole numerical representation right and on the right side we have two numbers this is what

this means and the question mostly people ask is what is the difference between decimal or numeric something like this as compared to something like real or double Precision or float right now the difference is a little complicated floating Point numbers they can be real double float whatever you call them they are represented in a very different manner as compared to the decimal counterparts which we have in decimal and numeric so the thumb rule

that you should have is if accuracy is an important aspect for that field so let's say you are storing an information like price for a field like price accuracy is very important right because it is going to be involved in a lot of calculation and a lot of things can go wrong if you have different different representations of the same number in different different systems etc etc right the accuracy is very important when it comes to a field like price so in this case you should always go with decimal right

you should always stick with either decimal or numeric but standard says you can go with decimal right so if accuracy is an important aspect then always go with decimal because floating Point representation can have different different values across different different systems because of the way they are stored because of the way they are processed they have a very different algorithm when it comes to rendering or at the same time let's say you have a field like size right and it is a size of a

area it can be something like five 67. 8987 something like this right some fractional number some floating Point number where accuracy let's say if 89 87 turns into 9 or 8 9 67 something like this very small discrepancies in the accuracy or the representation of the number does not impact a lot of difference in your systems when you have

numbers like this you should go with floating points right now if you have a question like if decimals are always accurate then why shouldn't we always go with decimals and the reason is because decimals always have accurate representation of their value and because floating points don't have that kind of accuracy floating points are considered faster when it comes to storage when it comes to performing different different calculations etc etc

right floating points are very fast that is the reason in a lot of scientific computation based domains roting points are preferred as compared to decimals or decimal representations so you should always evaluate what kind of value that you're going to store and if it is not something that is going to have a lot of impact if there are small discrepancies in the representation of the number then you should go with float because they are very fast to process and very

preferable in these kinds of scenarios but if it is a value like price or something something very critical where accuracy matters a lot then always go with decimals right that covers pretty much all the floating Point data types then we have a very interesting category which are string categories and we have three types in that we have care V care and text now all these three data types store text that is the similarity between the three data types the difference is when we say car 10 and

inside this field if he says store something like AB if you store something like this inside a data type which is declared as this and the length is defined as 10 then what the database system will do is it will pad extra eight spaces in that field before saving because we have defined the length is 10 and it is a care data type right that is the property of character data type

whatever length that you have defined does not matter the actual value that you're are trying to insert it will always try to store the same length if you don't provide the same length then it will just pad some empty spaces right inside whatever way it is represented it will add some empty spaces and it that's how it will store it and that is the reason people came up with another standard which is called Vare also known as variable character now the way it

works is when we say the length is 255 what it means 255 is the maximum length that you can store in this field but if you store something like this AB then the length will be two right it won't add Extra Spaces just to make a for this 255 length if you store a smaller value then it will be two if you store something like a b c d then it'll be four etc etc right depending depending on what Valu are actually being inserted

into the database it will adjust itself just that the maximum amount the maximum number that you can store is 255 you cannot exceed that if you do then you will get a database level error now the last thing is text which is a very modern alternative to Ware so imagine Vare without any length field basically any amount of text a text of any length that's what text means you don't enforce any length on

this field you can store a text of any length usually it has some kind of upper limit like 250 MB or something something you can do your own research for that but yeah and when we say 250 MB of text it is very very very long text right usually we won't be needing that much but that's what it means now the question is when you want to store a string which one you should go with the first thing is never choose care right

it it is a very old standard people used to do it way back so either go with varer or text you can use care if you know beforehand that the length is always going to be the same right it cannot be variable so let's say you are storing the codes of all the days of a week so the values can be M Mo or Tu or W Etc you get my point right if you have a requirement where you know that you

have all these values and all of them have the same length then you can go with a data type like Car 2 and this way you won't be wasting a lot of space because of the variable length of different different entries of the data right only go with care if you have a use case like is but if it is a general text let's say name of a customer name of a product etc etc or a description Etc ET then either go with Vare or text

now how do you choose Vare versus text what I believe personally is you should always go with text because if you go explore the documentation of Whois they themselves recommend that always use text which is the model alternative of Vare without length because most people what they confus with is if you use Vare with 255 this is going to perform better as compared to a field like text or you cannot index a field uh we'll cover

index shortly but people believe you cannot index a field of type text there are a lot of misconceptions around this because other databases databases like MySQL SQL Server sqlite they have different different conventions the way they operate the way they function are very different from postgress and when people switch between different different databases they get confused or they are misguided so if you explore the official documentation of postgress they recommend that always go with text there is pretty much no performance difference

between Vare or text and the second thing which is my personal opinion that using something like Vare 250 which is by the way a convention from myql that people from myql which was a very popular database and that has a Convention of using Vare 255 and that's the reason a lot of people still use this and 255 is at least when it comes to postgress 255 has a meaning in the

context of a mySQL database but when it comes to postgress people just use the number 255 as a random number right it has no meaning whatsoever in a postgress database and people just use it without thinking just because the way it has always been done and that's why I personally believe instead of going with a random number like 255 go with text and and the reason is let's say later on you want to increase the length for some

use case let's say you are dealing with a field called description and you just set the length is 255 as V so later on you want to increase the length so to cater to that requirement you have to do a database level migration which affects a lot of data and a lot riskier of course migrations are performed on a day-to-day basis but still you should always avoid a database level migration if you can right and that's the reason using something like text and enforcing the length for whatever it is on your

application Level and since we are not in the data engineering domain we are not in the data analyst domain people in those domains they usually deal with SQL directly right they have some kind of interface and they write SQL directly and they see the result of the SQL data directly etc etc right SQL is their first point of interaction but as backend Engineers when we are dealing with backend systems we usually interact with SQL databases through a driver right we have different

different drivers depending on what programming language that we are talking about and that's how we interact with the database through a driver and we have a lot of application code when we are dealing with the data that's the reason we can enforce the length or whatever constraint it is on the application code so that our database migrations all the database system database statements are simpler to read and simpler to maintain so let's say someone new someone new comes over and

they look at this they see vat 255 and they think that this has some meaning even though the person who just created it they just added it without thinking right it is just a random number for them but if someone new comes over they will think there is some meaning to this number or there is some meaning to this 10 number they will try to understand and where this is coming from and since there is no documentation they'll try to dig through the code Etc ET there's a lot of headache that comes with it

because a number a number like this a number like this this represents that it has some kind of meaning even though it doesn't most of the times right so instead of this if you go with just text then it is a lot simpler to read a lot simpler to write these migration statements all the SQ statements because you won't have to think about these lens and all and if someone new comes over and they read it they just think that it is a text field it is a character field

right there is no extra meaning to this so those are some of the concepts surrounding string Fields string data types in post moving on we have Boolean we can store true or false in this field we have date we can just store the date in this and we have time where we can store the time right in hour minute seconds format then we have time stamp for storing the information of date and time at the same time then we have time stamps with is extra Zed at the end

which just means store the time zone information also in this so it will store time stamp and a time zone information then we have interval for storing like 10 days or one week etc etc right those kinds of datas then we have uu ID U ID is a very a popular choice when it comes to making table primary keys right because of their Ur flend and their unique nature so that's the reason posg offers a native uid type right inside the database then we have Json

and Json b as I have mentioned before postgress offers native Json and we have an extra type which is called Json B also known as Json binary and the primary difference between Json and Json B is when we use a field called Json this will be stored as a plain text a plain text with the key value format that we usually see when we are reading Json but when we use this Json B this will be converted into a different format postgress will serialize it into

its own native Json format for efficiency purpose so that it can perform better it can perform a lot of query capabilities etc etc right there something that is limited to postgress it is not an SQL standard and most of the times you should go with Json B because of the performance advantage that it offers then we have the aray types you can store an arrow of different different data types right store an arrow of integer you can store an AR of Json date time Etc whatever all the different different data types that

we have then we have some other random data types like storing Network addresses Mac addresses and geometrical points XML etc etc that we don't really use on our data basis but if you do want to use it you can explore the official documentation and you'll find more information on that and this is how we insert data into our database we write insert into the table name and the Order of the all the fields that we want to insert the data in and then we provide

all the values in the same order that we wrote Our fields in right and this is how we are inserting all the data this is an example example small integer this is an integer big integer as you can see it is a lot higher than a normal integer then we have decimal we have real double Precision which basically floating Point representations then we have character Vare and text and we have bullan we can either pass true or false here we have date we have time and hour minute

seconds format we have time stamp where we have the date and the time we have time stamps you have date time and the time zone information then we have interval have uid this this is what a typical uid looks like we have Json Json B and while inserting even though Json and Json B looks the same but postgress serializes Json B into a different format for faster processing and faster retrial right then we have array then we have IP addresses Mac addresses geometrical points and XML etc etc right all the data types that we basically saw

and that's pretty much all about about all the data types that we are going to deal with as backend engineers in a typical database system next up what we are going to do is we are going to design the database of a project management platform in the previous video we created the apis we designed we modeled the apis of a project management platform similarly to understand different concepts of databases and and

different relevant Concepts not just theoretical concepts are very Advanced concept that we don't necessarily use on a day-to-day basis but very relevant Concepts in database systems especially in postgress we are going to design a database of a project management platform and we are also going to write queries for different different apis using the same databases now before we get into that just want to bring up one topic which are known as migrations in systems especially in production systems

you don't necessarily just open a software like table plus and you just start writing queries right you cannot do that because if something goes wrong or even if something does not go wrong but there is no way to track what all changes that are applied to a particular databas over time or who applied it right there is no way to control the version or the control the state of the

databases across different different times right and that is the reason most database systems follow something called as database migrations now database migrations are basically different different files let's say we have 1. SQL we have a folder which are called migrations inside the folder database right you have a folder structure like this you have DB folder inside that you have migrations and inside that you have different different files you have 1

dsql you have 2. SQL you have 3 dsql etc etc right and in each file you have a couple of SQL statements normal SQL statements nothing fancy right you have create table or create index or drop table etc etc and with that you have a tool usually it's a command line tool we have a lot of command line tools for performing migrations and maintaining migrations for example Le we have dbate or we have go migrate right a lot of this famous tools now what this tool

does it goes to this location DB migrations and goes through all these files in this sequential Manner and because of that we write the names of these files in a sequential manner either we perform some kind of numerical series 1 2 3 or we add dates uh timestamps and since time stamps follow a single uh flow that's the reason we get sequential files right okay now what

this tool does it goes through all those files in that sequence and applies all the SQL statements basically executes all these SQL queries on that particular databases whatever database that you have configured in your environment it executes all this SQL statements on that data database inside migration also we have two kinds of migration we have up migrations and we have down migrations now some of the modern approaches of handling databases they don't really believe or they don't really Implement

down migrations but this is the standard way of operating right you have up migrations up migrations are basically whatever change that you want to do you want to create a new table so you write create table etc etc you want to create a new type let's say you want to create a new enum type you write create type enom etc etc then you want to create some indexes whatever all the create statements you write one by one right with proper SQL syntax or postgress syntax then depending on the tool let's say a tool like dbate what it does it

divides the same file the same SQL file let's say 1. SQL into two sections one section with a comment for up migrations and another section for down migrations now the purpose of down migration is whatever change you did in your up migrations whatever change you applied to the database in the down migration you have to revert all those changes so that the reason for having this feature called down migration is let's say you

applied some migrations you applied some changes to your database systems and something goes wrong right or something breaks in your production system or anything can go wrong what what you want to do you want to roll back to previous version of the database without any changes right the state has to be the same for that to happen you have to have down migrations so that all the changes that you did in your up migration the down migrations can rever that so that you can go to a previous state you can roll back okay that's how a migration

workflow works you have a tool and that tool usually creates a table called SK scha something like that in your database and using which it tracks that which is the current version of the database right let's say you have 1 SQL 2. SQL 3. SQL right then in your schema table you will have something like four right current schema is four because four is the latest changes that you have applied on your database when you come back and you want to add another file. 5. SQL before you apply that file before

you up that migration your current data schema will still have four right that's how the tool works you have a migration tool which goes through all your SQL files applies all the SQL queries on your databases one by one if it is an up migration then it applies all the changes if it is a down migration it reverse all the changes for that version now the question is why do we need migrations there are a couple of reasons one reason as I've already mentioned you cannot just open a graphical tool and execute SQL statements because that's

very hard to keep track of what are the all the changes that you made right unless you want to save it to a particular file and you want to commit to git etc etc and after all those things you will come up with your own migration system that's why we have this concept called migrations couple of advantages of using migration is one as I said keeping track of database changes keeping track of database changes over time right you have a file you have a lot of files in a

particular folder which is committed to your version control system let's say it's git or git lab bit bucket whatever platform that you follow to Version Control your code usually your migration statements stay with your code and you commit it to get GitHub whatever platform you follow and over the time you keep adding new migration files to the same folder and that's how you keep track of all the changes that are happening to your database schema second is roll back as I already mentioned if something goes wrong you can roll back

to a particular version so that's pretty much all there is to it migrations are a system using which you apply new changes on top of your databases you can add new tables you can modify tables you can create indexes you can create triggers etc etc right whatever change that you want to perform on a database you have to create a new migration for it because relational databases follow a particular strict schema you cannot just insert data randomly into the databases you have to follow a particular schema

that's the reason we have this migration statements to make changes to the database over time using any command line tool now we will start modeling our database for our project management platform and for that we'll write migrations okay here I have my vs code open this is where we'll write all our SQL statements to model our database right so for the sake of migration we are going to use d mate it's a command line migration tool that helps applying

all the changes to your database right and for that we have to provide the URL of our database to the tool and for that we have to create a new environment file. EnV and inside that we can create a variable called database URL and the tool will pick up the value of our postgress database URL from this location okay so to create our first migration dbate offers a command you can write dbate new create users St that

sounds right so press enter it creates a new folder called DB inside that another folder called migrations and inside that a migration file so as I said it can follow a sequential number like 1 2 3 4 5 or time stamps whichever sequence that you want to prefer right in this case it's adding a time stamp then the name of our file which is created users table for now it's empty we'll add our migration statements here pretty soon so

this is how as I mentioned this section is for writing out down migrations whatever SQL statements that we write after this comment are considered as down migrations and whatever SQL statements that we write before this down migration and after this up migration are considered as our up migrations right now to save some time I'll just paste all the SQL statements and then we can go through that one by one right I've pasted all the SQL

queries that are needed to perform this migration the up migrations and create all the necessary tables so let's explore all the statements one by one to understand all the concepts surrounding it the first thing that you notice is we are creating some data types called as enums right now enums are basically all those values for which you have a predefined set of allowed values for example we

have a field called status in the projects table and for that we only want to allow these three values the project status can either be active or completed or achieved active or completed or archived right in the same way for tasks in the task table the task status either can be pending in progress completed or cancelled same way for members they either can have the role owner admin or

member the reason for using a data type called enum instead of just going with our standard text because first thing data Integrity data Integrity when we say in the context of databases we mean we leave the responsibility of checking the validity the checking the accuracy of the data to the database instead of our application code because we created types these enum types and we used these types let's say we created this type

called project status and as you can see we used it here right in the status field that's the type of the status field is our custom inom type which is project status because we did this whenever we insert data or whenever we are trying to update data in the projects table and and in the field of status if we try to insert any random string or let's say we try to install we have active completed or archived we try

to inster something something if we try to insert some random string like this we will get a database level error we don't have to check this in our application code we can leave that responsibility for the database we can check it in our application code that is just decreasing the scope of the error but still to make it more robust to make it more secure we can provide a set of allowed values and we can use that custom type for any field then whenever

we insert some data whenever we update some data the database will check whether that data whatever value that we are providing for the Project's status whether it is one of these if it's not then we'll get a database level error okay the first thing is data Integrity the second thing is documentation and I think this part the second point is more important than the first point because the first point data Integrity this we can do in our application code right we

since we are talking in the context of backends we have a programming language in our hand and we can validate whether whatever status that is coming from the user that is coming from the front end or whatever application code whether it is one of these strings or not completed or archived we can do that in our application code but another advantage of using enums in our migration statements is when you're going through your old migration statements for debugging something or let's say you are unboarding a new team member into your team and they are going through the

database migration statements to understand what all changes that the project went through over the last couple of months or the last couple of years and they want to understand right what is the state of the database what are the past states of the database what changes happened over the time etc etc right and when they are doing that when they are going through all the database migration statements if they come across this table called projects and they see the field status and if it is of type

text but in our application code we are only doing active completed archived now there is no way for them to find out this piece of information that you cannot put any arbitrary text into your status field it has to be either of these three right so to find out that piece of information they have to go through all the application code they have to see what what are all the places where this field is used and to track

down all those use cases then they can boil it down to these three values which is a lot of headache right so using enum type provides some kind of documentation for you for your team members for your future team members so that at one glance you can tell what are all the allowed fields that can be in a particular field which makes everything very easy to understand right so that's all about enum fields we are creating

three enum types and this is a syntax we write create type the name of the enum and we write as enum and inside the braces we provide all the allowed values we create three types three enum types and we are going to use this in our future tables now moving on we are creating a users table since it's a project management platform of course we'll have users right so that's the first table we are starting with we are creating a table called users and if you notice this each table has some default

fields we have ID we have created it we have updated it same for user profiles you have ID created at updated at for projects we have ID created at updated so each table has this kind of metadata kind of field the ID is not really a metadata it is the primary key and by primary key I mean using this ID we can uniquely identify a particular Row in this table right let's say we want to find out or let's say in our backend API we have a requirement that we want the information of user okay since it's a uu

ID let's generate a random uu ID and let's paste it and a comment here here okay now we have a requirement in our back end we have a API which takes the user ID in a dynamic parameter and we have to return the details of that user so this is the ID of that user and we have to find out the information of the user using this ID now if we don't make a particular field let's say the ID

field primary key okay we don't assign the condition primary key to a particular field there's a couple of things that can happen this primary key property has some implicit properties those properties are if you write primary key for a particular field that field cannot be null not null and it also has a unique constraint so relational databases have this concept called constraints right you can apply a particular condition particular property

to a field and that property or that condition will protect that field from going corrupt okay that's a very high level definition we'll see what that means when we encounter more examples so when we say primary key it means it implicitly has these two properties not null and unique though so this field cannot be null and this will always be unique if there is another entry with the same value gets inserted into this table we'll get a database level errors and because of

this implicit properties and some other foreign key Behavior we use this property called primary key for our ID field so each table will have a field by convention called ID and we will assign the property primary Q to that which will help us fetch different different information of the user okay now what we are saying we have a field called ID it is of type uu ID and it will be a

primary key for this table and by default if the ID is not provided when an insert statement is executed by default generate a random U ID okay this is what this means this is by default natively provided by postgress so if we don't provide the ID field which we generally don't we leave it to postgress the database system to generate a random ID for us and to assign it to that R next we have another field called email

and it is of type text it is not null one thing to keep in mind is by default all the fields in a postgress table can have null values unless you explicitly specify that it cannot have null values More than 70% of your tables Fields should have the not null constraint right because this not null constraint enables the database to have a consistent state if you don't add this

then the application can push null values into the table even though you don't do it uh intentionally because a lot of database interactions are also dealt with automated scripts so that script can have a bug and you'll end up with a lot of null values in your database which makes things very very difficult right so for a lot of reasons like that always try to stick to n Nal okay unless you have a reason for that field to be null okay that's the reason

we are writing not null here we have a field called email it is of type text and it cannot be null and we have constraint called unique but this means is each door of this table will have a unique email the moment you try to push another row with the same email address that already exists in the table we will get a database level error that it violates the unique constraint okay and since email is something that we don't want to create two users with the same email we are applying this constraint

called unique next we have a field called full name it is of type text again not null then we have password hash it is again of type text and not null since we don't directly store the plain text value of the password we hash the password and store it this it's called password hash just a name then we have two other fields create at and update at and the reason is we want want to know when this user was created okay when was the first time this user was created in our database that's the

reason we maintain this field called created it and usually we don't manually insert it we are providing this default property and we are saying if created is not passed by default you can take the current time stamp and you can put it in that row okay and it is of type timestamp with time zone okay this is of type time stamp with time zone this will have the date the time and the time zone information same way updated it we create this field so that we can keep

track of when was the last time we updated some field of this row okay and that can have some use cases for example if you have a table in your front end and you want to display all the users in the order of when was the last time their name was updated something like this right so in that case you can use this updated it property to sort by and that's all about users table some of the other things to notice here is all our tables have this plural form right we

have users we have user profiles projects and that is a standard a lot of people also prefer the singular form user user profile project Etc ET it's a matter of what your team has decided and what your company has decided but the industry standard is to mostly stick with the plural form second thing all the table names and all the field names will be small case and snake case and by that I mean if it is a single word it

will have small case of course as we have users ID email right all small case if it has multiple words let's say full name password hash created it update then we'll use snake case if you are coming from other progr programming languages or Etc you might have the instinct to go with uh camel case right you might want to write something like full name with n capital and you want to merge and get rid of the snake case but that is not a good idea because in postgress everything is case insensitive

by Nature so even if you write camel case postgress will take it as so if you write something like this will name pogress will understand this will name unless you write it like this full name with quotes and when you're writing your application code using double Cod again and again can be very difficult and make the code very ugly that's the reason we always try to stick with small case we make everything small case and if you

have multiple wordss we stick with snake case so that we can avoid using double codes to make postgress understand case sensitivity okay that's some of the things that you have to remember moving on we have another table called user profiles now if you check the commment it says one to one relationship with users now you might have a question like why are we creating a different table for storing information of the user right why don't we store all this information in the same table and the reason for that is you when you're designing a database you have to take

into account some of the different different situations that you might modify your data in so if you think about a user the only times you are going to make changes to a users entry in the users table is when they are modifying their profile and their profile can have lot of other information for the sake of this example we have just limited it to their image URL their bio their phone number but user can have other fields also right they can have their social media

accounts and they can have their websites and their projects etc etc a lot of different information a user profile can have and because of that we in the future we have to make a lot of migrations to the user table itself and we have to constantly modify the users row again and again and we don't necessarily have to do that if we abstract out the users profile information into a different table which can scale over time which can get bigger

over time and which does not affect the primary users table directly these are some of the bu cases that we go with one to one relationship for storing users profile information for storing user preference for storing metadata etc etc right this is a common convention that you will see when it comes to modeling your database so we have a table called user profiles inside this we'll store the information of the user for this what we can do is we don't necessarily

have to create another ID field because there is a one to one relationship with another table each entry of that table or each row of that table will only have one row in this table that's why it's called one to one relationship and because of that we can get rid of this ID and we can make the user ID which is the foreign key in this table as the primary key so we can write uu ID then instead of not null we can write primary

key and we can also get rid of this unique thing now this user ID field is both the primary key and the foreign key in this table and we have some other fields for storing the image URL the bio the phone number and if you notice we are not writing notal here because these can be optional Fields a user can choose not to have a bio they can choose not to provide the phone number or to have a image customized image etc etc right that's the reason we are not writing not n in these fields okay moving on we then

have another table called projects now the same thing happens here we have a primary key with ID and we are generating a default random ID each time we enter a new project into the database same way we have the name of the project which is not null and we have the description of the project which can be null we don't care much of the description but we have to have the name of the project that's why we are writing not null here same way we have a status

field which is using our custom enum type is project status is not null and if you don't provide any value for this field by default it will be in the active State okay usually when we are going with enum types we provide a default value with one entries of that enum to make easier for us so that we don't have to explicitly push that value from our application code then we have an owner ID which is a foreign key and we write it like this it is a not n

level field and says it references the users table and this is a constraint that on delete restrict and this is what it means this owner ID is the foreign key for stable and it references the user table and usually this is how we write users ID and because the ID field is the primary key in the users table we can omit this part and we can only say references users like we are doing here

and it means owner ID references the ID field in the user table users table okay now the second part we are putting a condition here saying that on delete restrict this is also known as refer differential Integrity which basically means that we can protect the data across different tables using the relationship between those tables in this case what we are saying if someone deletes this particular user which user the user whose ID is in this table the

project table as the owner ID so let's say we have a users table and we have projects table in ID field we have for the sake of this example let's say we have ID is one and in projects field we have another entry owner ID in this row owner ID is one so this user has created a project that's why in the projects table we have a row which has a field called owner ID and the value is

one when we write on delete restrict what it means is if someone tries to delete this user this user whose ID is one the database system will check whether this user has some corresponding rows in projects field since it has what it will do is it will check what is the referential Integrity constraint what is the condition we are mentioning here since we are saying restrict it will fail that operation you cannot delete a

user unless you delete the project first that's what it means using referential Integrity we can put some restrictions on some operations on some tables using the relationship between those tables and inside referential Integrity we have couple of constants one is on delete restrict second is on delete Cascade and what on delate Cascade means let's say if we had instead of on delate restrict

we had on delate Cascade so when someone deletes a particular user the database system will check if there are entries in the projects table which has owner ID as one it will also delete those projects instead of restricting it will Cascade which means it will delete the user and all the associated projects from the projects table they also have another con condition called set null and set default which basically means if someone deletes a particular user then owner ID in case of set null will be set

null but since we are saying not null here and if you try to set null then we'll get a database level error and this operation will not go through same way set default will try to set the default value if some default value is provided while creating the table these are pretty much all the referential Integrity constraints that we use on a day-to-day basis to protect our data from going corrupt to protect our data from going inaccurate right so this basically means we cannot delete a user as long as they have some Associated

projects in the project table because of this statement on delete restrict and owner ID is a foreign key and similarly we have created it and updated it moving on we have tasks which another table and in the same way we have a primary key called ID and we have a foreign key here called project ID it is a type uu ID because the primary key of projects is ID and ID is of Type U ID and since we'll be storing the values of ID here

this is is also U ID and we are saying not n which means there cannot exist a task without an Associated project ID that cannot be an orphan task that's what we are saying here it cannot be null then we have the foreign key constraint the referential Integrity thing you're saying it references projects and since we have AED the field it will check which is the primary key in Project table since ID is the primary key ID is the forign key here then we have the referential integrity

constraint we are saying on delete Cascade now as I've already mentioned on delete Cascade means if we have a project with ID one and if we have a task with id2 and project ID 1 this is a projects row there is a task row and if we delete the projects row with ID one it will check that project one has Associated tasks in the task table so what it will do is since we are redit in on delete Cascade it will delete all the

tasks which are associated with the project ID one because of referential Integrity this is a very useful Concept in relational databases then moving on we have the title field is of type text and not null and description which can be null and of type text then we have the priority which is an integer we can only store 1 2 3 4 etc etc and it cannot be null by default if we the application does not pass any

values we are setting the value one then in the end we have a constraint of check in po or any relational systems we have different different constraints one we have already seen is unique so for that field in the whole table you can have only one value for a particular field and that value will be unique across the whole table then we have not null which enforces the condition that you cannot set null in this field similarly we have

another constraint these are called constraints or conditions that we are putting on a particular field similarly we have check constant which basically using which we can put a particular condition on a field a custom condition and the operation the transaction will only go through as long as this condition is true that's what check means so if we see here you're saying when we are trying to insert the priority field if you do not pass anything set the default as one but if

you do pass anything we are checking that the value should be either 1 to five that is the condition that we are putting here the value of priority should be 1 to 5 so that no one can insert some random value like 55 into the priority field we're only considering from value 1 to five moving on we have a field called status and this is a custom enum type and which we created at the start is of task status and it can have pending in progress completed cancelled so the status will have our custom type task status it

cannot be null and by default the application if does not pass anything we are setting the value of pending here then we have a due date which is of type date we not considering time here or time zone we're just considering date here then we have assigned to we again have a foreign key condition so assign to is a foreign key which references users ID from the users table and the constraint the referential Integrity constraint that we are putting here is if the user of that ID is deleted and

that users's ID is present in the assigned to field in the task table then for that row we are setting the value of assigned to as null that's what this means and similarly we have created it and updated it the nice thing about the foreign key constraint is because you have written that assigned to or in this case owner ID right since these are foreign keys and by default relational databases have this constraint called foreign key constraint because of that

you cannot enter anything into the assigned to field if you try to enter some random string or some random uu ID it can be a valid uu ID but that U ID has to be present in the users table as a valid ID if that is not present then your insertion will fail okay that's what we mean by Foreign key constraint if you write references users this make sure that this U ID whatever uid that

you are passing for this field this has a corresponding user entry in the users field another thing to notice here is this pattern is called one to many because we are writing project ID here which is a foreign key which references the ID field in the proess table which basically means that if we have a project with id1 that can have multiple tasks right multiple tasks in the task table which refers to this project using the field project ID in the task table

right it can it will be one here and same way it will be one here it will be one here it will be one here right one project can have multiple tasks in the task table that's why it's called one to many relationship and this is how we implement it the first kind of relationship which is one to one relationship the way we Implement that is we take the primary key of the main table and we create another table and we make the primary

key of the main table as the primary key of the second table also but instead of writing just ID we write the name of the table with ID so because we had the table called users for a one toone relationship with the user profiles table what we did we wrote ID and we used the users primary key from this table as a primary key in this table and instead of just writing ID we wrote it as user ID is the primary key in this table this is how we implemented one to one relationship and the way we

Implement one to many relationship is we take the ID field and we create another table but we don't make that as a primary key we keep that as a foreign key only and we refer to that using the ID primary key right that's how we Implement one to many relationship in the next one we'll see how we Implement many to many relationship so moving on this is the last table it's called project members the reason we are creating this table is we want to keep track of what are all the projects that

a user is part of at the same time we also want to keep keep track of what are all the users that are associated with the project so you can look at it both ways and when you have a condition like that when you have a condition when you take the same thing and you can look at it both ways and it makes sense that's how we know that it is a many to many relationship because if we have a user that user can be part of multiple

projects at the same time right you can be part of multiple projects at the same time same way if we have a project that project can have multiple users at the same time right a project can not be limited to a single user same way user cannot be limited to a single project that's why projects and users have many to many relationship and we want to keep track of that relationship by creating a different table and usually we call this table as linking table and we implement

this pattern when we are implementing many to many relationship okay by creating a different table for the sake of maintaining the relationship with between two tables that's why it's called a linking table and this is how we do it okay we have to make some changes to that instead of creating an ID here what we'll do we'll remove this we'll remove this row we'll take foreign keys from both the tables so what we'll do is we have project ID here and we

have user ID here project ID is a reference to the ID field in the project table and we are seeing on delete Cascade as the referential integrity constraint same way user ID is a foreign key which references the users table and the referential Integrity constraint is the same thing on delete casket which basically means if the project is deleted delete all the entries in the project members table same way if the user is deleted delete on the ENT in the project members table okay and in the end what we'll do instead of writing unique we'll write primary key primary

key and what this means is this is a composite primary key in the Technic terms so it is a combination of two fields which are foreign Keys referencing primary keys of two other tables and we take those two and we create the primary key so if a user is part of a project then that will have one entry in projects members table let's say you have users table and you have projects table okay and you have

users 1 2 3 and projects 1 2 2 three and we want to maintain the relationship that let's say through some user Journey user one was added to project and we want to maintain this relationship in our projects members table what we can do we take project ID since it's a many to many relationship what we do we take the project ID which is two okay this is the project ID and you take the user ID which is one right take the user ID is

one and this combined will have one entry in the project members relationship because user one being part of project two is a single event right that cannot be duplicated user one cannot be part of project two multiple times same way project 2 cannot have user one multiple times okay that's the reason what we did we took this combination project two and user one we took this we said that these two combined will be the primary key for

this table because primary key implicitly enforces some constraints constraints like unique and not null so what we are saying this combination can only have one entry in this table which will be a unique entry and that cannot be null and because it is a primary key using this combination we can uniquely identify this Row in this table okay this is the CR of how we Implement many to many relationship in postgress or any

relational database using a linking table which will have a composite primary key of the two tables that we are trying to link the rest of the things remain same we have created it and updated that and we have a role field which is again a custom enum so we are saying it is a member role which we have created if we go to the top you're saying member role it can be either owner admin or member Okay so if a user is part of a project in that project whatever characteristics that

user will have or whatever project characteristics for that user it will have all those informations we can store in the project member St and because role is specific to the situation where a user is part of a project that's why we have we have kept the information of role in this table okay and by default we are setting the role member and that that's pretty much all our up migrations

same way as I mentioned we have to mention our down migrations all the changes that we did we created a couple of types we created a couple of tables now in the down migrations what you have to do we have to revert that so that when we want to roll down to a particular version using these statements our database can do the roll back okay we have to explicitly say that these are the down migrations that the database tool the migration tool can use to roll back to a previous version okay and we are simply dropping all the

tables and after dropping all the tables we are dropping all the types in the reverse order that we created them that's pretty much all the tables okay so let's go ahead and apply these migrations to our database so that we can see if it works or not and to apply it dbat has a command you can write DB M and you can write up okay and here applying this migration file and it says applied in this milliseconds and then we go here if we do a refresh and you can see you expand

it a little bit these are all the tables that we just created you can go here and you can check the schema it has project ID user ID role C project all the schemas right since we have not inserted any databases yet all these entries are empty all these tables are empty and you can also see there is a special table called schema migrations and this one is created by dbit as I've already mentioned all these migrations tools they have to keep track of what is the

current state of the database so that they can decide what migrations to apply right so after this after our first migration if you create a different file the database tool will check what is the current version and it will start applying the migrations from that point on instead of duplicating the Mig s and because if it tries to execute the same statements again it will get an error from postgress that this table already exists you cannot run the same statements again that's the reason the

migration tool maintains a table call schema migration and in that it has a single field called version and it keeps tag of what is the current version of the migration that we have now that we have our tables ready we want to insert some data into it now in production environments when you actually deploy this back end you will have normal user flows right people will come to your platform they'll use the front end to sign up with forms and they'll create projects with different different forms

you have this pre defined flows when the data will come into your database but when you're creating your backend when you're modeling your data for a backend engineer when you want to test all those changes then you want to see how the user interactions will look like basically for testing purposes in development environments you want to put some test data into your database so that you have something to test with before you deploy whatever changes that

you made in the database for that purpose we have a concept called as seeding and seeding basically means you write some script usually you create another migration or you can put it in the same migration also but the best practices say you create different migration file for seeding data seing test data test data into your database and it's called sitting basically seting means putting some test data into your database for testing purposes in development environments so let's create

another migration file with the command dbate new seat data SE data and when you press enter it creates another file called the time stamp stamp and the seat data because this time stamp is greater than the previous one these will be sequentially arranged and now this is end empty what I'll do is I'll again paste some SQL statements to save some time then we can go through what's happening there how we are inserting the data for seeding purposes these are all the SQL queries using which we are

putting some test data into our database into our tables so there is nothing special here basic SQL queries if you have already explored the basics of SQL the basics of postgress they'll you understand it if you not then please do it it'll make understanding this video much easier so what we doing we are using a CT also called as common table expression to make it more readable so we are creating an intermediate table called inserted users and in that we are inserting some test data into the users table taking some sample emails some names some password hashes and we're

inserting it and we returning two fields for this SQL query execution the IDS and the emails of the inserted users and we are taking that and this is an intermediate tib called inserted profiles what we are doing here is we're taking the inserted users table and using this from inserted users we are doing select select the ID of the user and we putting the image since the Avatar URL you have mentioned here we are putting some static image URL and

for the bio we are doing a case wi so if email is something like this then put this bio if email is something like this then put this bio for this put this bio otherwise put some logic to put different different bios for different different users then for the phone field we are just inserting some random phone number then moving on same way we are inserting some project and etc etc using SQL queries just normal CD so let's go ahead and apply

these migrations on our database so that we have some data to test with for applying the migrations we can do dmate up and we do it it applies seeds all the data into our database and we can go to our tool and we can open any table so as you can see projects has some entries you can click on a row projects project members users all the tables have some some entries and you can check the entries of the table for users we have this ID this

email this full name password hash for the second user we have these entries when you go to user profiles we have this user ID after URL bio for the second user this is the BIOS third user this is the bio etc etc like same way we also have projects right so our CD is successful and we have some data so that we can start testing now now let's say we have created the tables we have done some seeding and now we want to build

some apis for the first API that you want to build is you want to fetch the list of all users and you want to send it to the front end for that we can write a simple query and do select from users and we can execute this and we'll get all the entries right the all the users that exist in the users table now what you want to do is for each user we also want to send the profile information which is on a different table right we have the user profiles

table so for each user we want to fetch the profile information from the user profiles table and you want to embed that into the user row and you want to send it to the front end in a single API call we use we don't want the user to make a different call for fetching the profile information that's why we are embedding the profile information before we send it so let's write the query for that usually when we write SQL queries always start from from section right instead of the select section so that you know exactly where you are drawing

your data from so we write from users you and this is called an Alas and assigning an alas usually a one letter or two letter alas to a particular table name inside an SQL query makes it more convenient to refer to that table again and again in that query right that's why we use Al SS now from users table we want the information of the user next thing what we want is we want user's

profile information which is in a different table okay and it is in the user profiles table now the thing that connects these two tables as you already know is the foreign key so each user profile entry each row in the user profiles table has a field called user ID which is a foreign key which refers to the primary key of the user table so what we can do we can use the join

operation to join the users table with the user profiles table on a condition which is the user ID should be the same okay and to write that we can say left join and the reason we are using left join is because for some reason if the user profile in the user profiles table there is not an entry for a particular user okay it is possible in our in our backend system it is possible that the

user has never edited their user profile that's why we have never created an entry for them in the user profiles table so in these situations if we go with inner join we will get an mty result because what inner join does each condition for a particular condition there should be an entry for both the tables okay since in this case we want the user information in any case right in any case does not matter whether they have the profile information in the user profiles table or not but we want the

user information anyway that's the reason we are going with left joint so that even if there isn't an entry in the user profiles table we'll still get the value of the users information from the users table okay that's the logic behind going with left joint and most of the times we'll be using inner joints and left joints then we are writing SQ queries those are the mostly used joints you want to join with the user profiles table and we'll name it alas up and the condition for the joining is on the join

condition is u. ID the ID field in the users table should be the same as the user ID field in the user profile table this is the condition okay this is the condition on which we are joining both these tables now we know where we are drawing our data from okay we have joined two tables we have an join table and now what we can do we can select in the select phase we'll see what are the data that we want to fetch now that we know where we are drawing our data from we'll see what data we are drawing from

what we'll do we want to fetch everything from the user R so we'll say U do star comma we want to add another field in each row called profile okay so for that field what you want you want the corresponding Row from the user profiles table and you want to convert it into a Json and we want to embed it into a field in the user row called as profile okay for that what we can do we write 2 Json B which basically converts

a particular row into a Json it is a inbu function provided by pogress and we can use that to convert it and we can write take up. STAR which means take the corresponding Row from the user profiles table convert it into a Json and name it as profile before returning okay we have our query now now let's try running it okay and when we run it let's see what the data looks like and this is what the data looks like we have all the fields

from the users table we have ID email full name password hash created at updated that and we have an extra field called profile which is a Json and that has all the information from the user profiles table okay now using this query in a single API we can send the users data and the user profile data from a single database call okay using joints another thing to notice here is whatever data whenever we run a particular query in any relational data the data that we

get returned is in a random format they don't follow any particular format so because of that whenever we run a select query whenever we are returning a list of data points into our front end we have to sort it by default using some column and usually what we do we sort it by the created it column because we want to return the data in the reverse order of when they were created so that the user can see all the latest users so for that

what you can write order by U do created at you want to take the created at field from the users table and we want to sort by that and we want to sort by descending because we want the latest entries and when you do that and run it we get all the users in the reverse order of how or when they were created okay and this is what the query for an API of get all users looks like so if we have an AP something like this

slv1 SL users if we have an endpoint which is a get endpoint our backend can use a query like this a database query like this to return all the data to the user of course there will be other processing that will happen this the serialization Der serialization depending on what language that you are using if you're using something like JavaScript or typescript or nodejs then the serialization overhead is not that much but if you're using something like go then it has to deserialize into its own struct then that struct will be sent

by serializing into a Json format before it gets sent over the Internet to our front end right this is what a typical get all users endpoint looks like now let's see another end point which is something like slv1 SL users SL a particular user ID okay so we'll write it as user ID we'll get it as a dynamic parameter in our API and we can pass that user ID to our database query as a

parameterized query we'll see what parameterized query means but if you don't understand this part please was the previous video where we have done a deep dive on API designing and what are Dynamic parameters how we design these API end points etc etc so let's jump on what this query will look like catching the information the profile information and the user information of a particular user given a user ID from our Dynamic parameter before we get into that let

understand what is a parameterized queries parameterized query is basically a safety mechanism provided by databases which basically means before running a query you can provide a particular slot can provide a particular slot and you tell the database that this is your query and at this position there is an empty slot but before I execute this query I will

provide the information in this slot now the catch is whatever information I provide here it will be a string you can cannot pass it you cannot pass it to render it as a database action call right it will just be a string so if someone passes something very dangerous like so we have a query and inside the slot if you pass like let's say delete from users a query which deletes everything deletes all the users in the database that will not be considered as

a SQL query because the way parameterized query works is whatever you pass in this slot they are considered as a string okay they are escaped in the technical term and it is a safety mechanism to avoid situations avoid vulnerabilities like SQL injection where if you are constructing your query instead of using parameters you are constructing by concatenating Different Different Strings you constructing your SQL query instead of using parameters

then you can be a victim to SQL injection okay so this is a very important concept you can do your own research to understand how SQ injection Works etc etc but to simplify everything whenever you have a dynamic value in this case we want to fetch the information of a single user using the ID of that user so the ID of the user is the dynamic value here so what we want to do we want to keep a particular slot in our query and we tell the database

that when we run this query we are going to provide the ID of the user but for now it is just an empty slot and when we provide the ID it will it should just be considered as an user's ID just a simple string nothing else no more power should be given to this particular string okay that's the use of parameterized query it makes your query more secure to execute and most of the times whatever database driver whatever library that you are using whether in nodejs or go rust python whatever it is your database

driver your OM they have capabilities to handle all these slots with parameters and etc etc since we are not getting into code or any specific programming language here we'll be looking at how parameters work in the SQL editor itself fortunately the this table plus the software that we are using to write our queries and to see our results etc etc they are providing an interface to provide parameters to our query so we

can see how it works but in real world when you're actually building backend you'll handle parameters using your library using your driver whatever driver that you are using okay with that let's write our query to get the information of a single user for that we don't really need to change much we still want all the information of the user from the users table we still want all the information of the user from the profiles table using this Json conversion the only thing that will change is we want to filter by users's ID when a single users information now

for that we want to add a where clause and write where users ID u. ID is equals to a parameter so this is where we are assigning a slot an empty slot we are telling that this is the name of the variable is called user ID and this is the whole query and we are saying that this is the query and this is an empty slot and when we run this query we'll provide you the value of this slot so that you can do your comparison and show us the results so let's run this and we

are shown this dialogue so that we can pass the actual value of the user ID but as I said in your actual backend environment you'll be handling this in code in your particular driver or Library so for this let's let wrap this we have to wrap it in single codes right all the strings in SQL have to be wrapped in single code to be interpreted as strings that's why we are wrapping it

in string and we are pasting The UU ID of the user okay this is the U ID I have pasted it here and then let's run this and when we run this we get the information of a single user and you can check this is the user ID we still getting all the information from the users table and we're still getting all the information from the profile table but in this case we are only getting the information of a single user using our parameters with the same query so this

if you remember was for the endpoint V1 SL users SL colon user ID so ideally while routing when the front makes the API call with the appropriate U ID value this will be routed to your particular Handler the Handler will pass it to your service the service will pass it to a repository and it will pass the user ID the actual U ID you'll take the U ID and you pass it in the appropriate parameter using the slot thing that I mentioned and after you get this result you'll get

the result like this and this will get serialized and transferred over to the friend end in Json format going back to the get all users API the get all users query what if we want to provide some Dynamic filters or some Dynamic sorts usually in apis like these where we fetch an array of entities like get all users get all projects get all task in the same API we also have to support some kind of dynamic sorting or some kind of dynamic filtering or both at the

same time now the problem is without you using these queries without constructing these queries in a proper orm or using a programming language constructing the SQL query ourselves showing that part of the mechanism in a SQL editor is a little difficult but we can imagine how the query might get constructed in an actual programming language when we are actually writing our backend code and we can focus on just the SQL query part

without focusing too much on how we are going to construct the query so if you want to go back to the earlier query which was get all users so when we remove the Weare clause and when we run this we're getting all the list of all the users just as we had before now we want to add some Dynamic sorts and some Dynamic filters now let's see what all fields we have in the users table we have ID email full name password hash

let's say we want to add a filter condition and we want to filter by the name and our condition will be we will pass a first letter of the name and we want to return all those users who have the first letter add that particular letter whatever the front end is passing that is going to be our filter condition and for sort we want to take the sort order from the user and we want to take this field by which the user wants to sort the result okay and as I said it's a little difficult showing how the quer

is going to be constructed because what happens in an actual backend is let's say this is our query this is our query payload and it's say p generated query right so user can send these all query parameters they can send a page query parameter if they don't send it by default we are going to set pages one same way they are going to send a limit parameter if they don't send it we set

it as something like 10 or 20 whatever makes sense in your use case then we come to the filtering part so as I said for the filter we want to filter by the first letter of the name so let's say we take a query parameter called letter and if they don't pass it then in this case we don't set any default if they don't pass it then we set it as null or in an actual backend system we don't include this in our query itself okay when they

constructing our query dynamically we check that if the user has passed this query parameter or not if they have pass we include it in the we Clause if they have not then we don't include it at all okay that's how we do it in an actual backing system same way we check the sort conditions now the first thing is we want to check let's say sort by you check if this query parameter is present or not if this is present then we

consider that as a sorting parameter usually we don't allow the user to pass any field as sort parameters uh we give them some options that you can either pass let's say full name or by email or by created ad okay we give them some options some fields from the table that we allow user to pass and depending on that we give them the result and the last thing that we allow them is sort order what is the order that you want your results in whether it is ascending

order or descending order now for these two Fields what we do we check if the user has passed a sord by parameter then we need consider that for the Sorting mechanism if they have not by default we take creat at as the default sorting parameter same way we check if they have passed sort order or not if they have we consider whether it is ascending or descending if they have not by default we take descending so if the user has

not passed any of this what do we end up with we end up with these query parameters by default we set the pages one limit as 10 sld by created at and order by descending right by default we anyway give them results as per this but we also give them the option to customize how they want the results right now with that let's see how we can construct the query we have our filter which we are considering as the first letter of the name we have our sort and

we allow them to sort by email or full name or create at and we also uh want to pigate this since it is a list of all users whenever we are creating list apis we should most of the time try to pigate it unless we have a particular reason we have a particular user interface where we don't want to Pigeon it but most of the time we should pigeon it okay now going back to our query we have this normal query now what we want to do we

want to add some more capabilities to this so the first thing is we want to add the filter condition you want to give users the ability to filter by the first letter of the name of the user so what we can do we say where U do full name I like I like means the SQL operator like but it is case insensitive so it does not matter is upper case or lower case it will still try to match the pattern okay I like and inside this

what we do we pass a parameterized query and the parameterized query will be say letter okay then we concatenate with let's say the percentage sign let's run this query and let's provide the letter G and let's run this and we are getting two results because these two users have the first letter of the name starting with J that's why we're getting these two results now let's try to run this with another letter let's say x for this

we don't get any results because we don't have any users who have names starting with x same way if we try to run it with something like a we get the user Alice Brown right so the filter part is working as expected and for those uh who are not familiar with the basics of SQL what this we are doing is we are taking the parameterized query whatever is coming from the front end and we are concat erating that with this

symbol this percentage symbol what this means is take the letter whatever the first letter is so if we pass J this is what happens J and the percentage symbol and this goes to the I like pattern matching so what we're saying give us all the names which is starting with the letter j but we don't care whatever comes after that okay that's the reason we are getting all the users who have the first letter with J okay now we have

our filter condition implemented let's go ahead and Implement our sort conditions so as you can see we are adding the default sort condition created at descending here now what we want to do we want to make it Dynamic so we want to say order by this is a parameterized query we want to say Do by and as order we also want to make it a parameter they're saying sort order okay we have our sort conditions let's check

this and we have three parameters to fill now so let's provide it as J and for the sort by in the back end we have to construct this parameter in a way that it matches the existing query since we are using Al SS in our existing query we have to provide the sort condition in the same way what we want to do we want to sort by the emails of the user so so the parameter we should pass is U do

email right because we have to match the alas that is existing in this query same way sort order we want to say descending and let's run this and we are getting two results John do and Jan Smith and this is how it is sorted John comes first then comes Jan so let's try to run this again but this time make it ascending now Jane comes first and John gum second so our sort conditions are

also working as expected now the last thing we also want to pigate this query since it is a list of all users we want to pigate this query now in the end we can write the user is trying to fetch page one we have page and limit for the concept of page if you want to fetch the first page in the query we have to add something like offset it and we want to take also a parameter for here so we write page page range you want page now

the second thing is we also want to take the limit and for the limit we also take a limit parameter and now let's run this we have two more parameters for now we want page as zero now in the actual back end the user will pass pages from one to whatever the maximum amount and we are setting also the page number is one but since we we are talking about the actual database query here we have to pass the first page as zero right the offset

starts from zero that's how we want to differentiate between the actual database query and whatever is set for the experience of the user the we want to start from the offset zero then the limit is one since we are only getting two results for this query we want to check if the pation is working as expected or not that's why we are setting the limit is one and when we are run this we getting Jan myth as the first result and when we try to run this again but this time we wanton page the second page in the backend concept it is

the page two and in the database concept it is offset one and when we run this we getting job okay so the page ination concept is also working and as I mentioned these things will change a fair amount when you are actually writing the EXL query in your back end because you have to construct all these queries dynamically you using your programming language but to understand what happens at the database level this example is pretty much enough so far we have seen what the database query looks

like for this API get all users and this API get a single user now let's proceed to creating a user which is post / aa/ users let's see what the query for this will look like now if you're already familiar with SQL basics this is a simple insert operation what we want to do is want to do an insert call insert into users you want to insert into users table and what are the columns that we are passing we are passing email and

full name and the password hash which we'll create in our backend code but for the sake of this example we are just passing the password hash second we want to pass the actual values and let's use parameterized queries here for email then for name and then for password and this this camel case password hash and in the end we want to return what all users that were created using this query for that we can do

returning Stu so return all the users all the rows that were created using this query so let's try to run this we have to pass all the parameters and in single codes since these are all strings right so for email we can do something like test at gmail.com and for the full name we can do test test and for the password any random stream right for the sake of this example then let's run this and we run

it we are getting a single row as a result because we are doing returning star and we just created one user we are getting a single user entry a new user was created and we're getting all the information for that user and when we go to the users table we can see that a new user with test at gmail.com is created and is in the table okay okay and this is corresponding to the post / API SL users API since we are creating a user

so we did a insert operation and created a user and return the response in this API next now let's get to update a user now going back let's go to our query editor minimize this now as I mentioned before in our actual backend code we will be constructing these SQL queries dynamically depending on what are the parameters that is passed from the user okay so for the for the update API what the API usually looks like is it is a

patch API and we are giving the user to update certain properties of their profile okay we are trying to update the user profiles table and we are giving the user a form some kind of form in the UI to update their profile so they can update their bio they can update their phone number or they can update their image URL in the field Avatar URL for this we want to make the payload partial and by that I mean they can pass either

all the three fields or they can pass a single field but depending on what are the fields they are passing we only have to update that field for all the other fields we don't touch that okay so in the programming language depending on what you're using we check what are the fields that the user has passed and from a set of allowed Fields so let's say we check if bio is present then take the latest entry what the user has passed

the latest value of bio and update that in the database same way we check if the phone number is passed update it but Avatar URL is not passed then don't touch it keep it the same okay so with that understanding going back to our SQL query editor what query we can write is after going through our backend code we have seen that the user has only passed bio and phone number so this is what the query is going to look like after getting constructed okay we say update

user profiles then set bio equals to this is going to a parameterized query bio equals to bio comma they have also passed phone number and phone equals to phone we're going to pass these values in our parameterized queries and we also have to pass for which user we want to update uh usually we'll take this as a dynamic parameter from the URL as as you can see here

we're taking the ID of the user from the URL so that's what we will pass in our database right and we want to update the information the profile information of a single user right so we have to pass that in our parameterized query for that we can write where user ID which is a field in the user profiles table equals to a parameter called user ID and in the end we are saying whatever row that you have updated return the latest value of that row and then let's run this and for

Bio we are seeing updated bio for phone number we passing a new phone number and we run this and when you check it you can see the bio is updated and the phone number is also updated this particular user and that's how an update user API will look like now I want to show something else if you notice here we have a field called updated at and the purpose of keeping this field in a table is every time a particular row is updated we also have to update this

field corresponding to that row to the current time stamp the time stamp when the row was updated and if we check the field this is same as the created it this is the same as when the user was first created it is not updated since we just updated the value uh right now this should have the current time stamp which it is not now usually there are two ways of achieving this first is the manual

way every time you are doing an update operation you can explicitly set the value of updated column to the current time Stam using your application code you can pass another parameterized query and you can say updated at equals to updated at okay you can do that or you can use another feature of databases called trigger and what triggers do is you can set a particular condition and when that condition is met you can perform some

action that's the whole idea of using triggers at database levels so what we are trying to do here is we will set up a workflow using which what will say is every time we are updating a particular Row in any table we want to set the updated at field of that row to the latest time stamp using triggers okay we want to make that change so that we don't have to manually change the updated at field every time we update

the information of a particular user okay for that we have to write another migration that is one change that we want to make the second thing that you should know about which is very important concept is indexes or indices what is database indexes is right this is a very important concept and it affects the performance of your queries a lot okay so we have to understand what is indexing and we have to implement it in our own database using a migration now let's understand what indexing is in

the first place so going back to our earlier query so if we go to the history tab here and let's check what was earlier query so let's say this one okay and let me close this and on this this one right there is a select query for for the list of all users okay and if you notice here we are using some conditions here let's try to find out what are conditions that we are using the first thing that you notice is we are joining with another table and here

we are saying the ID of the users table should be same as the user ID in the user profiles table this is the join condition here and it is an equality condition okay same way another thing that we are doing here is we are sorting by the email field in the users table and we are sorting in the order of ascending this second thing to notice and I'll explain why we are drawing attention to this part right now these are the two things now coming back to

what are database indexes index is basically means by the normal definition if you are familiar with the index section of a book so we Google it let's say book index this is a typical index section of any book and what we see here is the writer of the book or whoever the publisher is they have created an index section where we they are saying you can find chapter one starting from page it

and and chapter 2 starting from page 29 chapter 3 starting from page 35 etc etc right all the chapters are given here and the corresponding page number where you will find the starting of that chapter and because of this index section if someone wants to directly skip to chapter 4 they don't have to open the book and they don't have to go Page by page page by Page let's say the book has around so it says 82 let's say the book has around 100 pages so to find the chapter

4 the user does not have to go through 50 pages one by one in order to reach chapter 4 just to find the location of the chapter 4 they don't have to manually scheme through 50 pages just for the sake of finding the location of a certain page right they can directly refer to the index they can see that chapter 4 starts from page 54 and they can directly jump to page 54 right in the same way when we are talking about

databases going back to our earlier tables this is the task table we have a bunch of tasks and every task has a single ID right and we can see all the results in this particular format because it is a software which fetches all the rows from our database and shows us in a sequential manner but when the database stores our data it is not in a sequential manner right every entry in a row might be somewhere else since in the

earlier part of this video we have discussed that the databases that we are talking about relational or non relational these are disk based databases which means the database takes our data and stores it somewhere in the disk somewhere right it can be anywhere in a very efficient format it follows a lot of algorithms a lot of complex systems goes behind where to store which data etc etc but on a very high level on a very simple way of understanding we can assume that each row is stored

somewhere in the disk right if we have a query where we want to find out the information of this particular task with this particular ID which we can pass in a parameterized query and you want to fet the information of this particular task what is the Brute Force without thinking about indexes what is the Brute Force way the database system has a bunch of tasks and it has has stored all the task in different different locations right in in a physical hard disk when it gets a query which says

that we want the information of this particular task what it has to do it has to go through all the task one by one to find out if the ID matches this if the ID matches this if the ID matches this etc etc and finally when it reaches this ID okay it it has a match it can see that the ID that we have passed it accurately matches this particular ID and then what it does it Returns the information of this particular ID to us and we have our result now what happened

is we sent a query the database did a sequential scan okay it went through all the tasks one by one in different different locations of the disk as I said the database system has a very complex way of storing the data in database right and it when it went to all those locations and it checked whether the ID matches this item whether the ID matches this item etc etc and until at the end it found a match and it returned the information which as you can already see is a very tedious and a very time-taking and a very inefficient

solution okay now since we only have around six entries in this table it might not seem a lot uh it is a lot fast but imagine if we had add a, entries or a million entries or a billion entries in the table then going through all those items one by one and checking from different different locations of the disk can take a lot of time and can make the query very inefficient so for that purpose what the database offers is a

feature called index what index does in the same way a book index gives us all the page numbers for a particular chapter if you want chapter 4 we can directly see that it starts from page 28 same way on a very high level not to go into the technical Deep dive of how indexes work how B tree Works etc etc which you can do your own research for but just for understanding the Practical purpose of index why we use indexes for

each ID so we'll understand how index is actually implemented practically but for each ID so we can index a particular field of a particular table let's say if we index by ID so what an index contains is for each ID you have the location here okay for each ID like this okay for each ID from here for each ID it has an table for each ID it has the corresponding location of the row right where exactly

in the disk in the hard disk the information of this particular task is stored so it has another table a lookup table where it stores the ID of the task and the location of the task same the ID of the second task location of the second task etc etc now that it has an index like this when it gets a query which says that we want the information of this particular task what it does instead of looking through all the items all the tasks in the disk from different

different locations one by one what it does it refers to the index it tries to check whether this exists in the index or not so as you can see the index is a direct access it is not stored in different different parts of the disk it is stored at one place in sequentially so it does it is very fast to go through all the entries so it checks whether it matches the first one the second one the third one the fourth one the fifth one and it finally finds that this entry so it is somewhere here this one matches it

so it directly finds this entry in a very short time and in the corresponding lookup value it finds the location where exactly the information for this task is stored in the hard disk and it gets the location goes to that location directly and returns whatever value is stored in that location directly to our user whoever made the database query in the first place now as you can see compared to the previous operation which was a sequential lookup going through different different locations across the hard disk and trying to find out whether the ID that we have passed matches the

current ID or not etc etc and finally return the result as compared to this approach where we have a lookup table where we are storing the ID of each item and the location of that item okay and we are directly giving the result of that to the user and that is the purpose of index index is basically a lookup table you can imagine it as a lookup table which has information for a particular field and the location of the

particular Row for that field now that is the first property of the index it has a lookup table and it can directly access whatever value that you have passed from this table and it can find the corresponding location of the entry it can return you the result the second property of the index is the order of the index it can be either ascending it can be descending so if you remember in the previous query where we are saying we want to return the list of all users sorted by descending order right now for

that query to perform as efficiently as possible because we have a requirement like that what we can do first thing is we want to create an index on created at right we want to take the field instead of ID we are taking the index created at and we want to index that field so that when we execute that query the database can instantly find all the created at fields and the corresponding Row for the that created at field and it can return

it the second thing is because the sort order is also in the picture we also have to decide in what order we want to index either ascending or descending or both of them okay we can create different different indices but it all depends on our requirement so as you can see by default we are returning the descending order of creat De so when creating the index we can say that we want to create an index on the field

created at in the descending order so that when we execute that query which has order by created at and descending the index can quickly go through all the created at field that is already ordered in descending order and it can quickly return that result instead of if it had all the creat FI in ascending order then it will have to go through all of that and you have to rearrange and do a sort and then return right depend depending on our requirement we can customize the

behavior of the indices okay that's a very high level introduction to indices so what you have to remember is every time you are using a wear clause or a join condition and depending on what is the frequency that you are doing that but is the frequency that you're doing the join condition or the we Clause you can decide to create an index on that condition so that the subsequent queries are faster because how the database does the lookup so the two two things that we saw one thing is we have to create

trigger so that the updated at field is automatically updated every time we update a particular row of the user second is we have to create indexes so that the join conditions and the wear Clauses and the Sorting conditions are faster okay so let's go ahead and create a migration which covers all these I have gone ahead and created a new migration file using dbate and pasted all the uh index create statements and the trigger statement so let's go through and understand what's Happening Here what are we exactly doing here so

in the first section you can see in the up migrations we are creating a bunch of indices the first one we are creating an index on the email field in the users table because we have let's imagine we have an API where we have to find a particular user by the email or we have a database query we where we are joining the users table with the task table depending on the email of the user right in the join condition we have the

condition where we are matching the email because we have a requirement like that we decided that we want to index the email field of the users table so that in join conditions and in the wear Clause we can quickly find the corresponding user given a particular email using the index and that was the reasoning behind using email or index ing email from the users table so this is the syntax looks like this is a generic SQL syntax same way we are creating an index on the created field for the users table in descending order

because if you remember the select query the get all users API where we are by default returning all the users sorted by the created field in descending order since we have an API like that and it is pretty free frequently called that's the reason we want to optimize the query execution of that particular database query by creating an index on the created at field in the descending order so that the database can do a fetch operation very fast as compared to FH

operation without using indices okay same way for the task table we are creating an index on the project ID so if you remember going back to our first migration where we created the table we have the task table and we have project ID here okay and this is a foreign key so imagine we have an API where we are fetching all the tasks of a particular project right and for that APA we have

to create a database query where we have to take the projects table and we have to join it we have to left join it with the task table depending on the it ID field of the project table and the project ID field in the task table that is going to be our join condition right these two has to match now as I said that whatever field is involved in your join conditions or in your wear Clause if you index that field that operation is going to be much faster now by

default whatever field is the primary key of the table that is by default automatically indexed by your database right you don't have to manually do it so in the join condition two fields are involved the ID field and the project ID field from Project table and the task table so since ID field is the primary key in the projects table we don't have to think about this this is automatically indexed by the database but in the task table we have the field project ID is is a foreign key and this

is involved in the join condition and since it is involved in the join condition we want to index it because it is not index by default since it is a foreign key it is not the primary key in this table so that is the reasoning behind indexing in this migration we are creating an index on the project ID field on the task table because we have a join condition where we want to fetch all the task of a particular project right for that we have to make a left join in the left join we we have a

condition where we are matching by the project ID and that's the reason we want to make that operation little faster that's why we are indexing by project ID field in the task table same way in the task table we have an assigned to okay this is also a foreign key so imagine another API where we want to fetch all the task of a particular user so this assigned to is a foreign key which references the users table so this will

have a corresponding entry in the user table with ID of that user right so to implement that a what we have to do we have to take the users table and we have to join it with the task table on the condition that assign to has to match users. ID right since ID is the primary key of the users table we don't have to think about that but in this table assign two is a foreign key right so we have to Index this field to make the join operation of

joining users table with task table a little faster that's why we are creating an index on the assigned to field in the task table okay next we are creating an index on the created at field in descending order in the task field because the same reasoning if you want to return the list of all tasks to show it on our front end by default the API returns all the tasks sorted by the

created at field in descending order because we want to return all the latest tasks and we can fetch all the latest task by sorting all the tasks in descending order and to fetch all the tasks in descending order to make that operation a little faster we are creating an index on the created it field in the task table in descending order right same way we are creating an index on the status field because in the get all tasks API let's say we have a filter condition in the filter condition what we are doing we want to fetch all

tasks so if we go to task status we are creating an custom ENT type and we have four types of status pending in progress completed and cancelled now let's imagine in the get all tasks API we want to fetch all tasks which have the status pending okay we have a requirement like that and the front end the user can pass any Dynamic status right the pending progress completed whatever right and we want to quickly fetch all the tasks

depending on what is the status of the task and because the status field was involved in a wear Clause what is the condition for creating an index or what is the thumb rule if that field is involved in join condition or in a wear clause or in a sort condition okay these three are your primary guidance on when you should consider creating an index for that field okay since status field is used in a Weare Clause we can

consider creating index on the status field in the task table now another thing is you should not create an index right away whenever you have a condition that field is used in a we clause or a join condition or a s by you have to see and you have to evaluate how frequently that query is being called or if the performance trade-off is worth it or not since to maintain an index the database at all times has to take that field let's say you have the project ID field in the task table project ID and in the

lookup we have all the locations of the task right we have all these entries to maintain this index to have the latest state of all the tasks in that index what the database has to do every time you do an insert task or every time you do an update task this index has to be accurately maintained and to maintain that every time you do an insert or update the database has to do some kind of operation to store the latest state of the task in the index

table right and which adds some kind of overhead some kind of overhead even though it is not a lot and it depends what is the size of your database how many entries you have in your database index operations them CS can add some kind of overhead so as a backend engineer you have to evaluate these conditions that whether creating an index is worth it or not whether the overhead of maintaining an index IND is worth it or not and that depends on how frequently your database query is

executed and what kind of user experience that you want to provide at a tri you have to consider a lot of parameters but on a very high level without thinking about all this micro optimizations you can take these three parameters that whether that field is involved in a join condition or a wear clause or a sort operation if it is and you see that that query is frequently called then you can go ahead and create an index on that and if you in the future you see that

that query is not being frequently called then you can just go ahead and delete that Index right okay but as a start you can consider creating it and monitoring your performance etc etc same way we are creating an index for project I in Project members we are creating an index on user ID in Project members etc etc right the same logic the same kind of reasoning applies everywhere okay that's pretty much all about index now coming to the triggers part as I said we want to automate the operation of every time we update a particular Row in any table we also want to update the updated

field of that row so that we don't have to do it manually from our application code for that what we are doing we are creating a function in postgress we have the ability to create custom functions and we are creating a function and we are naming it something and we are seeing it returns a trigger and we are starting a transaction okay and what we are saying is take the row and the updated field of that row and update it to the current time stamp and return the

new row whatever with the new value of the updated field and now we have a custom function that is ready to use okay next what we are doing we are creating Triggers on different different tables right we are taking all the different different tables users user profiles project tasks and we are creating Triggers on each table what we doing we are naming the trigger something so that it is easier to drop the trigger later on in in our down migrations okay and what you are doing you're saying every time we are doing some kind of update operations on this

table go to that row whatever row is updated in that operation and execute this function and what this function does it takes the updated field and sets it to the current time stamp okay and with that setup what happens every time we do an update operation this will automatically update the updated at field for that r okay now that's pretty much it we are creating triggers for all the tables and in the down migrations we are deleting all the triggers we're deleting the function and we're deleting

all the indexes that we created then we can go ahead and execute these migrations for that we can do dbate up and this is applied now going back to our table plus let's do that update operation again right so we remove this this this was our earlier update operation right and let's execute this say updated bio 3 okay and let's apply it and go to this Row And now when you see the updated at

field you can see that this is the current time when I am recording this video and this is the current date and time the time stem accurately matches so our trigger accurately worked right we did some kind of oper update operation on this particular row and this field is automatically updated and that's the use of triggers now with that we have created database queries for all these apis okay now going through each of these apis and creating a database query

is going to take a lot of time at least a few hours more and I don't want to make this video longer than what it already is so you can go ahead and with all the uh reasoning that I have explained here and all the SQL Basics and the postgress basics that you have already learned you can go ahead and create the database queries for all these apis right and you can see how the join conditions work how the indexes are coming into place how the triggers are working etc etc etc right and that's

pretty much all you need to know and of course that's not all you need to know but that's pretty much 80% of what you are going to be doing as a backend engineer while you are dealing with databases okay you're going to take apis and you're going to analyze what all payload is coming from the user and you have to construct a dynamic query depending on that you have to use parameterized queries to securely pass whatever user value that you have and you have to execute that query and you have to return the data right that's a very high level work through of what

you're going to be doing as a backend engineer when it comes to databases of course there are a lot of other things but this pretty much covers on a very high level the most of the things that you'll be doing